SUPPORT VECTOR MACHINES

 

1.     SVM with various kernels

 

The SVM command is in package called e1071.

 

> install.packages("e1071");  

> library(e1071)

 

Let’s use support vector machines to classify cars into Economy and Consuming classes.

 

> ECO = ifelse( mpg > 22.75, "Economy", "Consuming" )

> Color = ifelse( mpg > 22.75, "green", "red" )

> plot( weight, horsepower, lwd=3, col=Color )

 

The two classes cannot be separated by a hyperplane, but the SVM method is surely applicable.

 

> S = svm( ECO ~ weight + horsepower, data=Auto, kernel = "linear" )

Error in svm.default(x, y, scale = scale, ..., na.action = na.action) :

  Need numeric dependent variable for regression.

 

Error? There are other, unused variables in dataset Auto that prevent R from doing this SVM analysis. We’ll create a reduced dataset.

 

> d = data.frame(ECO, weight, horsepower)

 

> S = svm( ECO ~ weight + horsepower, data=d, kernel="linear" )

> summary(S)

 

Parameters:

   SVM-Type:  C-classification

 SVM-Kernel:  linear

       cost:  1

      gamma:  0.5

 

Number of Support Vectors:  120

 ( 60 60 )

 

So, there are 120 points violating the separating hyperplane or the margin, 60 in each class.

 

> plot(S, data=Auto)

Error in plot.svm(S, data = Auto) : missing formula.

 

Same story. We need to use a reduced dataset that contains only the needed variables.

 

> plot(S, data=d)

 

 

This is the final classification with a linear kernel and therefore, a linear boundary. Support vectors are marked as “x”, other points as “o”.

 

We can look at other types of kernels and boundaries – polynomial, radial, and sigmoid.

 

> S = svm( ECO ~ weight + horsepower, data=d, kernel="polynomial" )

> summary(S); plot(S,d)

Number of Support Vectors:  176

 

> S = svm( ECO ~ weight + horsepower, data=d, kernel="radial" )

> summary(S); plot(S,d)

Number of Support Vectors:  121

 

> S = svm( ECO ~ weight + horsepower, data=d, kernel="sigmoid" )

> summary(S); plot(S,d)

Number of Support Vectors:  74

 

         

 

Adding more variables should give a better fit – to the training data.

 

> S = svm( factor(ECO) ~ weight + horsepower + displacement + cylinders, data=Auto, kernel="linear" )

> summary(S)

Number of Support Vectors:  99

 

We can identify the support vectors:

 

> S$index

 [1]  16  17  18  25  33  45  46  48  60  61  71  76  77  78  80 100 107 108 109

[20] 110 111 112 113 119 120 123 153 154 162 173 178 199 206 208 209 210 240 241

[39] 242 253 258 262 269 273 274 275 280 281 384  24  31  49  84 101 114 122 131

[58] 149 170 177 179 192 205 218 233 266 270 271 296 297 298 299 305 306 313 314

[77] 318 322 326 327 331 337 338 353 355 356 357 358 360 363 365 368 369 375 381

[96] 382 383 385 387

 

> Auto[S$index,]

     mpg cylinders displacement horsepower weight acceleration year origin

16  22.0         6          198         95   2833         15.5   70      1

17  18.0         6          199         97   2774         15.5   70      1

18  21.0         6          200         85   2587         16.0   70      1

25  21.0         6          199         90   2648         15.0   70      1

            < truncated >

 

 

2. Tuning and cross-validation

 

The “cost” option specifies the cost of violating the margin. We can try costs 0.001, 0.01, 0.1, 1, 10, 100, 1000:

 

> Stuned = tune( svm, ECO ~ weight + horsepower, data=d, kernel="linear", ranges=list(cost=10^seq(-3,3)) )

> summary(Stuned)

- sampling method: 10-fold cross validation

 

- best parameters:

 cost

  0.1

 

- best performance: 0.1173718

 

- Detailed performance results:

   cost     error dispersion

1 1e-03 0.2478205 0.10663023

2 1e-02 0.1432051 0.05485355

3 1e-01 0.1173718 0.04208311                      # This cost yielded the lowest cross-validation error of classification.

4 1e+00 0.1326282 0.04461101

5 1e+01 0.1351923 0.04819639

6 1e+02 0.1351923 0.04819639

7 1e+03 0.1351923 0.04819639

 

We can also find the optimal kernel.

 

> Stuned = tune( svm, ECO ~ weight + horsepower, data=d, ranges=list(cost=10^seq(-3,3), kernel=c("linear","polynomial","radial","sigmoid")) )

> summary(Stuned)

 

Parameter tuning of ‘svm’:

 

- sampling method: 10-fold cross validation

 

- best parameters:

 cost  kernel

  0.1 sigmoid

 

- best performance: 0.1046154

 

- Detailed performance results:

    cost     kernel     error dispersion

1  1e-03     linear 0.2164744 0.10501351

2  1e-02     linear 0.1326282 0.05074006

3  1e-01     linear 0.1096154 0.04330918

4  1e+00     linear 0.1172436 0.03813782

5  1e+01     linear 0.1223718 0.04775672

6  1e+02     linear 0.1223718 0.04775672

7  1e+03     linear 0.1223718 0.04775672

8  1e-03 polynomial 0.3720513 0.08274072

9  1e-02 polynomial 0.2601282 0.06438244

10 1e-01 polynomial 0.1987821 0.07443903

11 1e+00 polynomial 0.1784615 0.05328633

12 1e+01 polynomial 0.1580769 0.04909157

13 1e+02 polynomial 0.1555128 0.04999836

14 1e+03 polynomial 0.1504487 0.04722372

15 1e-03     radial 0.5816026 0.05687780

16 1e-02     radial 0.1301282 0.05190241

17 1e-01     radial 0.1198077 0.05104329

18 1e+00     radial 0.1223718 0.04118608

19 1e+01     radial 0.1096795 0.04835338

20 1e+02     radial 0.1198718 0.04184981

21 1e+03     radial 0.1146795 0.04354410

22 1e-03    sigmoid 0.5816026 0.05687780

23 1e-02    sigmoid 0.1530769 0.04517581

24 1e-01    sigmoid 0.1046154 0.03711533        # The best kernel and cost.

25 1e+00    sigmoid 0.1173718 0.04715638

26 1e+01    sigmoid 0.1530769 0.06159616

27 1e+02    sigmoid 0.1582051 0.06489946

28 1e+03    sigmoid 0.1582051 0.06489946

 

> Soptimal = svm( ECO ~ weight + horsepower, data=d, cost=0.1, kernel="sigmoid" )

> summary(Soptimal); plot(Soptimal,data=d)

Parameters:

   SVM-Type:  C-classification

 SVM-Kernel:  sigmoid

       cost:  0.1

      gamma:  0.5

 

Number of Support Vectors:  164                   # We know that more support vectors imply a lower variance

 ( 82 82 )

 

Number of Classes:  2

Levels:   Consuming Economy

 

 

Let’s use the validation set method to estimate the classification rate of this optimal SVM.

 

> n = length(mpg);   Z = sample(n,n/2)

> Strain = svm( ECO ~ weight + horsepower, data=d[Z,], cost=0.1, kernel="sigmoid" )

> Yhat = predict( Strain, data=d[-Z,] )

> table( Yhat, ECO[Z] )

 

Yhat        Consuming Economy

  Consuming        82       9

  Economy          17      88

 

> table( Yhat, ECO[Z] )

> mean( Yhat==ECO[Z] )

[1] 0.8673469

 

 

 

3. More than two classes

 

Let’s create more categories of ECO. The same tool svm( ) can handle multiple classes.

 

> summary(mpg)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.

   9.00   17.00   22.75   23.45   29.00   46.60

 

> ECO4 = rep("Economy",n)

> ECO4[mpg < 29] = "Good"

> ECO4[mpg < 22.75] = "OK"

> ECO4[mpg < 17] = "Consuming"

 

> table(ECO4)

ECO4

Consuming   Economy      Good        OK

       92       103        93       104

 

> S4 = svm( ECO4 ~ weight + horsepower, data=d, cost=0.1, kernel="sigmoid" )

Error in svm.default(x, y, scale = scale, ..., na.action = na.action) :

  Need numeric dependent variable for regression.

 

R was trying to do regression SVM but realized that ECO4 is not numerical. We can direct R to do classification by replacing ECO4 with factor(ECO4).

 

> S4 = svm( factor(ECO4) ~ weight + horsepower, data=d, cost=0.1, kernel="sigmoid" )

> plot(S4, data=d)

 

 

> Yhat = predict( S4, data.frame(Auto) )

> table( Yhat, ECO4 )

 

           ECO4

Yhat        Consuming Economy Good OK

  Consuming        88       0    2 33

  Economy           0      96   58 15

  Good              0       2    9  5

  OK                4       5   24 51

 

> mean( Yhat == ECO4 )

[1] 0.622449

 

It’s more difficult to predict finer classes correctly